An Overview of Windows AI Foundry: Local AI Development and Deployment
Session Date: May 19, 2025
Duration: 1 hour
Venue: Build 2025 Conference - BRK223
Speakers: Tucker Burns (GPM, Windows Platform + Developer Team), Dian Hartono (Product Manager Lead, Windows Developer Platform Team)
Link: [Microsoft Build 2025 Session BRK223]

Executive Summary
This comprehensive session introduces Windows AI Foundry, Microsoft’s complete platform for local AI development on Windows. The session demonstrates three key components: Windows ML for custom model execution, Windows AI APIs for ready-to-use AI capabilities, and Foundry Local for open-source model deployment. Through extensive live demonstrations, Tucker Burns and Dian Hartono showcase how developers can leverage on-device AI across CPU, GPU, and NPU hardware while maintaining flexibility between local and cloud deployment models.
Key Topics Covered
?? 1. The Case for Local AI: Beyond Cloud-Only Solutions
The Turning Point for Client-Side AI
Tucker’s Opening Statement: > “We are at a turning point for local AI. Some applications will only run cloud models, while others will only run them locally, yet others will prefer a hybrid approach. We believe that power comes in flexibility.”
Key Drivers for Local AI Adoption
Compliance and Privacy Requirements:
- GDPR compliance - Data sovereignty and user control
- HIPAA regulations - Healthcare data protection requirements
- DMA compliance - Digital Markets Act data handling rules
- Full data control - Models run without data leaving the device
Performance and User Experience:
- Zero network latency - Critical for real-time applications
- Sensor proximity - Audio, video, and input processing benefits
- High availability - Operations without internet dependency
- Cost optimization - Local models don’t need cloud-scale infrastructure
Power Efficiency and Background Processing:
- NPU utilization - Dedicated neural processing units for efficient AI workloads
- Proactive processing - Constantly running models without performance impact
- New experience classes - Background AI capabilities enabling innovative features
?? 2. Windows AI Foundry Architecture: Three-Pillar Platform
Comprehensive AI Development Stack
Windows AI Foundry
??? Windows AI APIs: Ready-to-use inbox models
??? Foundry Local: Open-source model catalog and execution
??? Windows ML: Foundation layer for custom models
Strategic Platform Benefits:
- Built-in and third-party model support - Flexibility in model selection
- Versatile development environment - Multiple approaches for different use cases
- Cross-silicon compatibility - CPU, GPU, NPU execution across hardware vendors
- Hybrid deployment options - Seamless switching between local and cloud inference
?? 3. Windows ML: The Foundation Layer
Public Preview Announcement
Major Release: Windows ML now available in public preview
Core Capabilities:
- ONNX Runtime powered - Industry-standard model execution framework
- Cross-silicon execution - CPU, GPU, NPU support with automatic optimization
- Flexible model support - PyTorch, custom models, Hugging Face catalog integration
- Out-of-box runtime - No binary embedding required, reduced application size
Developer Workflow and Tooling
AI Toolkit for VS Code Integration:
- Model conversion - PyTorch to ONNX or silicon-specific formats
- Optimization scripts - Pre-built optimizations for common architectures
- Custom optimization - Starter templates for specialized model tuning
- Quality evaluation - Built-in model testing and validation
- Application integration - Windows ML NuGet package integration
Live Demo: AI Dev Gallery Image Classification
Device Policy Demonstration:
- Manual control - Explicit CPU, GPU, or NPU selection
- Smart policies - Automatic hardware selection based on optimization goals:
- Max efficiency - Prioritizes power-efficient execution
- Max performance - Optimizes for speed and throughput
- Minimize power - Balances performance with battery life
Technical Implementation:
// Core Windows ML Implementation Pattern
1. Reference infrastructure package
2. Download execution providers (via Microsoft Store)
3. Register execution providers with runtime
4. Set device selection policy
5. Compile model for target hardware
6. Execute inference with optimized modelDeveloper Feedback and Impact
Industry Testimonials:
- Development timeline reduction - “5x faster” deployment across silicon platforms
- Integration simplification - Reduced complexity for ISVs and enterprise developers
- Time-to-market improvement - Weeks reduced to days for multi-platform AI deployment
?? 4. Windows AI APIs: Ready-to-Use AI Capabilities
Inbox Model Integration
Distribution and Management:
- Windows Update delivery - Models distributed via OS updates
- Copilot+ PC optimization - Enhanced performance on new hardware
- API abstraction layer - Developers don’t need to manage underlying models
- WinApp SDK delivery - Integrated into standard Windows development frameworks
Comprehensive API Portfolio
Vision APIs:
- Image Super Resolution - Intelligent image scaling and enhancement
- Image Segmentation - Background removal and object isolation
- Object Erase - Selective content removal from images
- Image Description - Natural language image analysis and captioning
- Text Recognition (OCR) - Text extraction from images and documents
Language APIs:
- Text Generation - Phi Silica-powered content creation
- Conversation Summarization - Key point extraction and meeting summaries
- Content Moderation - Automatic content safety and compliance checking
Live Demo: Image Description API
Real-World Comparison:
Human Description: "A simple one-bedroom apartment"
AI Model Output: "The image shows a simple, minimalistic floor plan of a
small apartment or studio with a kitchenette, one bedroom, and one bathroom."
Development Integration Flow:
- API capability check - Verify model availability on device
- Model instantiation - Create API instance with required parameters
- API invocation - Execute model inference with input data
- Result processing - Handle structured output from AI model
- Visual Studio export - Direct integration into development projects
??? 5. Customization and Fine-Tuning Capabilities
Two-Track Customization Strategy
LoRA Fine-Tuning:
- Lightweight adaptation - Nudge models toward specific domains or tones
- Company voice optimization - Align output with organizational communication style
- Workflow specialization - Adapt models for specific business processes
- Technical terminology - Enhanced understanding of domain-specific language
Knowledge Retrieval (RAG):
- Semantic search powered - Intelligent information retrieval from local data
- Private knowledge grounding - Answers based on proprietary documents and content
- Dynamic content handling - Real-time access to changing information
- Multi-modal support - Text, image, and document integration
Live Demo: LoRA Fine-Tuning Workflow
AI Toolkit Integration:
- Project creation - Define fine-tuning objectives and model selection
- Dataset preparation - Training and test data for custom scenarios
- Azure integration - Cloud-based training with local model deployment
- Evaluation and testing - Quality assessment through AI Dev Gallery
- Production deployment - Seamless integration into applications
Use Case: Feedback Categorization
Input: "This app is awesome, but it needs a better Get Started icon"
Before fine-tuning: Generic response
After LoRA adapter: Categorized as "Compliment + Feature Request"
Live Demo: Knowledge Retrieval (RAG)
Contoso Note App Implementation:
- Multi-modal content - Text documents, images, and mixed media indexing
- Semantic search - Meaning-based rather than keyword-based retrieval
- Natural language queries - “Find me a vegetarian recipe and turn it vegan”
- Contextual responses - Relevant images and content alongside text answers
?? 6. Foundry Local: Open-Source Model Ecosystem
Public Preview Features
Azure AI Foundry Integration:
- Extensive model catalog - Pre-optimized models for local execution
- Cross-platform compatibility - CPU, NPU, GPU support across hardware vendors
- Extensible platform - Support for multiple model catalogs beyond Azure
- No complex setup -
WinGet install Microsoft.FoundryLocalfor immediate access
Command-Line Interface Capabilities
Developer-Friendly Tools:
# Check available models for current hardware
foundry model list
# Download and run model locally
foundry model run <model-name>
# Check cached models on device
foundry model cache
# Interactive chat mode
foundry model run phi-4-mini-reasoningLive Demo: Reasoning Model Execution
Local Inference Example:
Query: "Tucker has one computer. There are four total. How many does Dian have?"
Model Response: Step-by-step logical reasoning with final answer
Hardware: Local NPU execution with real-time processing
SDK and API Integration
Seamless Cloud-to-Local Migration:
// Cloud endpoint configuration
const cloudEndpoint = "https://api.azure.com/openai";
const cloudModel = "phi-4-reasoning";
// Local endpoint with three lines of code
import { FoundryLocalManager } from 'foundry-local-sdk';
const manager = new FoundryLocalManager();
const localEndpoint = await manager.getEndpoint("phi-4-mini");Model Management Benefits:
- Single instance sharing - Multiple applications share one model copy
- Automatic optimization - Hardware-specific performance tuning
- Storage efficiency - Reduced disk usage across applications
- Memory management - Intelligent loading and unloading of models
Technical Architecture Deep Dive
Hardware Acceleration Strategy
Cross-Silicon Support:
- Intel - CPU and integrated graphics optimization
- AMD - Ryzen processors and Radeon GPU acceleration
- NVIDIA - CUDA and Tensor Core utilization
- Qualcomm - NPU-optimized execution for Snapdragon platforms
Model Deployment Pipeline
End-to-End Model Lifecycle:
Model Selection ? Optimization ? Local Deployment ? Runtime Execution
??? Azure AI Foundry Catalog
??? AI Toolkit preprocessing
??? Windows ML runtime
??? Hardware-specific acceleration
Security and Privacy Architecture
On-Device Processing Benefits:
- Zero data transmission - All processing occurs locally
- Compliance alignment - GDPR, HIPAA, DMA requirements met by design
- Enterprise control - IT policies can govern local AI usage
- Audit capabilities - Complete visibility into AI operations
Live Demonstration Results
AI Dev Gallery Showcase
Image Classification Performance:
- Real-time inference across CPU, GPU, NPU hardware
- Dynamic policy adjustment for power vs. performance optimization
- Visual Studio integration with one-click project export
- Cross-platform model compatibility testing
Windows AI APIs Integration
Image Description Accuracy:
- Detailed spatial analysis - Room layout and furniture recognition
- Contextual understanding - Apartment type and feature identification
- Natural language output - Human-readable descriptions
- Real-time processing - Sub-second inference on local hardware
Foundry Local Performance
Multi-Model Support:
- Quin, Phi, Mistral, DeepSeek - Diverse model ecosystem
- Reasoning capabilities - Complex logical problem solving
- Interactive chat modes - Real-time conversation interfaces
- Resource optimization - Shared models across applications
Industry Impact and Adoption
Developer Community Feedback
Quantified Benefits:
- 5x faster development - From weeks to days for multi-platform deployment
- Timeline reduction - Massive cuts in development cycles
- Integration simplification - Reduced complexity for ISVs
- Market acceleration - Faster time-to-market for AI features
Production Deployments
Windows Features Powered by AI APIs:
- Click to Do - Enhanced user interaction capabilities
- Windows Search improvements - Intelligent search and discovery
- Outlook email summarization - On-device conversation analysis
- Gaming content creation - Real-time highlight reel generation
Web Standards Integration
Microsoft Edge and Browser Support:
- Prompting APIs - Web-based AI interaction capabilities
- Writing assistance - Browser-native AI-powered content creation
- Web standards proposal - Industry-wide API standardization efforts
- Cross-platform compatibility - Consistent AI experiences across environments
Session Highlights
“We are at a turning point for local AI… We believe that power comes in flexibility.” - Tucker Burns
“With power efficient NPUs, you can run models proactively or constantly in the background with no regrets. This enables a new class of experiences.” - Tucker Burns
“Windows ML cuts dev timelines massively, letting us focus on delighting gamers.” - Developer Testimonial
“Five times faster… we spent two weeks just getting our AI feature to work on NPUs. With Windows ML we got everything working on three different chip platforms in just three days.” - Luyan Zhang, Filmora
“For us, the Holy Grail is being able to take a single high precision model and have it just work seamlessly across the range of Windows silicon.” - Aidan Fitzpatrick, Rewind AI
Implementation Guide
Getting Started with Windows ML
1. Development Environment Setup
**Required Tools:**
- Visual Studio or VS Code with AI Toolkit extension
- Windows 11 with latest updates
- Windows ML NuGet package
- AI Dev Gallery from Microsoft Store
**Basic Implementation Pattern:**
1. Reference Microsoft.AI.MachineLearning package
2. Download execution providers via downloadPackagesAsync()
3. Register execution providers with runtime
4. Set device selection policy (CPU/GPU/NPU)
5. Compile model for target hardware
6. Execute inference with optimized performance2. Windows AI APIs Integration
// Image Description API Pattern
// 1. Check model availability
if (ImageDescriptionModel.IsSupported())
{
// 2. Create model instance
var model = await ImageDescriptionModel.CreateAsync();
// 3. Process input with required parameters
var result = await model.DescribeImageAsync(imageData);
// 4. Handle structured output
ProcessDescription(result.Description);
}3. Foundry Local Setup
# Installation and basic usage
winget install Microsoft.FoundryLocal
# Browse available models
foundry model list
# Download and run specific model
foundry model run phi-4-mini-reasoning
# Check local model cache
foundry model cacheBest Practices for Production Deployment
Hardware Optimization Strategy
- NPU prioritization for power-efficient background processing
- GPU utilization for compute-intensive inference workloads
- CPU fallback ensuring compatibility across all hardware configurations
- Dynamic policy adjustment based on battery status and performance requirements
Model Selection and Management
- Inbox APIs for common scenarios requiring no model management
- Windows ML for custom models requiring specialized optimization
- Foundry Local for open-source models with cross-application sharing
- Hybrid deployment combining local inference with cloud capabilities
Development Workflow Optimization
- AI Dev Gallery for rapid prototyping and capability exploration
- Visual Studio integration for seamless project creation and deployment
- Azure AI Toolkit for model conversion, optimization, and fine-tuning
- Cross-platform testing ensuring compatibility across silicon vendors
Advanced Applications
Enterprise AI Deployment
IT Management Capabilities:
- Group policy integration for enterprise AI governance
- Model distribution through existing Windows Update infrastructure
- Compliance monitoring for regulated industries and data handling
- Performance analytics for optimization and resource planning
Developer Productivity Enhancement
Integrated Development Experience:
- One-click model integration from AI Dev Gallery to Visual Studio
- Automatic hardware optimization without manual configuration
- Shared model libraries reducing application size and complexity
- Real-time testing with immediate feedback on model performance
Innovative User Experience Patterns
New Application Categories:
- Proactive AI assistants running continuously with NPU efficiency
- Real-time content creation without cloud dependency or latency
- Privacy-preserving analytics with complete on-device processing
- Offline-first AI applications maintaining functionality without internet access
Resources and Further Learning
Official Documentation
- Windows AI Documentation - Comprehensive developer resources and API references
- AI Toolkit for VS Code - Model optimization and development tools
- AI Dev Gallery - Sample applications and interactive demos
Development Tools
- Windows ML NuGet Package - Core runtime and APIs
- WinApp SDK - Windows AI APIs integration
- Foundry Local - Open-source model management and execution
Additional Build 2025 Sessions
- Windows ML Deep Dive - Advanced implementation patterns and optimization
- Windows AI APIs Workshop - Hands-on development with inbox models
- Foundry Local Architecture - Technical details and deployment strategies
- AI Workstation Optimization - Hardware selection and configuration guidance
Community Engagement
- Email feedback: Windows AI team contact for scenarios and requirements
- Build booth visits - AI Workstation demonstrations and expert consultations
- Labs and breakouts - Hands-on experience with fine-tuning and customization
About the Speakers
Tucker Burns
GPM, Windows Platform + Developer Team
Microsoft
Group Program Manager focusing on AI initiatives within the Windows Developer Platform, leading the strategic direction for local AI capabilities and developer experience optimization.
Dian Hartono
Product Manager Lead, Windows Developer Platform Team
Microsoft
Product Manager Lead specializing in enabling developers to build AI experiences on Windows, with focus on API design, developer workflows, and cross-platform compatibility.
This session establishes Windows AI Foundry as Microsoft’s comprehensive platform for local AI development, demonstrating how developers can leverage on-device AI capabilities across diverse hardware while maintaining the flexibility to integrate with cloud services. The combination of ready-to-use APIs, custom model support, and open-source ecosystem access positions Windows as the premier platform for hybrid AI application development.